2023-05-01
Welcome to Biology 395, Advanced Bioassessment. In this course we’ll:
Mainly, you’ll be taking a look at and reviewing my lectures outside of class in Canvas. Once you review those, you’ll come to class and we’ll practice what you’ve looked at in the lectures prior to class. There will be two exams, one lab report, and lots of small assignments based in R and that will total up your grade.
General outcomes:
Specific outcomes:
This course is meant to be modeled after how a professional biologist’s year. It will include a field season toward the beginning of the semester and will analyze data once the weather isn’t so conducive to acquiring data.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ── ## ✔ dplyr 1.1.2 ✔ readr 2.1.4 ## ✔ forcats 1.0.0 ✔ stringr 1.5.0 ## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1 ## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0 ## ✔ purrr 1.0.1 ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ## ✖ dplyr::filter() masks stats::filter() ## ✖ dplyr::lag() masks stats::lag() ## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggpubr)
library("FactoMineR")
library("factoextra")
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
You’ll see “chunks” like these throughout the course. This is actual R code, and is meant to be a guide/reference to you so you can take it with you and use as the basis of your own code in the future. These HTML slides are what you’ll see in class, but I will give you a PDF document of all the code at the end of the semester. I can also give you particular lectures in PDF format upon your request.
Today, we’re going to take a look at a brief overview of what’s in the course starting with conservation biology/genetics.
knitr::include_graphics("Conservationgenetics_model.jpeg")
# 
We’ll look into the details of run off sources and how they’re monitored, mainly in Virginia via the USGS and DEQ.
knitr::include_graphics("Runoff_sources.png")
We’ll also talk in a lot of depth about biological monitoring. This is really at the center of the course as this completely relies on our ability to generate indices of biological integrity.
Speaking of Indices of Biological Integrity (IBI), we’ll talk in depth about how these are generated, and how they’re applied. Indices of biological integrity have been around now for decades. These include but are not limited to:
The links above are really just references for you. We’ll talk in more detail about these later.
As indices developed, multimetric indices of biological integrity were developed and requested by the EPA so that states could have their own metrics and define what is impaired streams under the 303(d) section of the Clean Water Act.
Virginia developed it’s own IBI known as the Virginia Stream Condition Index VSCI that we’ll discuss in depth.
A need was also identified for a multimetric index for streams that are not of a “higher gradient.” Research to develop this index through the mid-2000s produced the Coastal Plain Macroinvertebrate Index CPMI
We will talk about both of these in detail and learn how to analyze basic VSCI and CPMI datasets in R.
eDNA or environmental DNA is a tool used in lieu of doing physical biologial assessment. It can be used to detect cryptic, invasive, or endangered species without harming them with any sort capture method. The method isolates DNA from a number of “matrices” namely: - water - soil - bone/excrement
knitr::include_graphics("edna_image.jpg")
Now, eDNA includes things a broad as metagenomics where we isolate DNA from many species in one substrate (mostly commonly, water) to look at what’s present in entire ecological communities.
Good review examples:
One of the first assessments necessary when working on a field site for any biomonitoring effort, is to assess physical habitat. The EPA developed rapid bioassessment tools that include physical habitat monitoring.
In normal site visits, you would look at each aspect of biomonitoring (Phab, fish, macroinvertebrates, water quality) all at the same time, but since we’re learning we’re mostly going to take it one at a time, starting with physical habitat.
Subjectivity is a real issue here. As each scientist could assess physical habitat differently. The EPA as a result made a standarized protocol in an effort to decrease variability.
knitr::include_graphics("EPA_physicalhabitatprotocol.png")
I also HIGHLY recommend taking a look at at the Rapid Bioassessment Protocol from the EPA as well
As a result, our aims in this presentation will be three fold:
This lecture is meant to review Physical habitat methods, and next week, we’ll go into the field. When we introduce R, I’ll show you how to import data, and then we’ll do a brief “analysis.”
There are 10 Physical characteristics of streams we will look at. Each slide will represent one of the ten aspects, and we’ll briefly discuss before going into the field. -Epifaunal Substrate and Available Cover.
knitr::include_graphics("epifaunalsubstrate:availablecover.png")
knitr::include_graphics("embeddedness.png")
knitr::include_graphics("PoolSubstarte.png")
knitr::include_graphics("VelocityDepth.png")
knitr::include_graphics("Poolvariability.png")
knitr::include_graphics("Sedimentdeposition.png")
knitr::include_graphics("ChannelFlowStatus.png")
knitr::include_graphics("ChannelAlteration.png")
knitr::include_graphics("Frequencyriffles_bends.png")
knitr::include_graphics("ChannelSinusosity.png")
knitr::include_graphics("BankStability.png")
knitr::include_graphics("BankVegetativeProtection.png")
knitr::include_graphics("RiparianBufferZone.png")
The next steps will be actually acquiring these data at our field site (Buffal Creek - check). We’ll gather the data for the 10 characteristics we discussed and talk about what we think is good and/or bad for QA/QC purposes.
After that, during our R introduction, I will attempt to have you perform a small analysis of Physical habitat data from Virginia DEQ to see if we can look at some basic statistics and perceive interesting about the data.
To get R and Rstudio (you’ll need both), you can go here
Please have this by next class (Aug 30). This will be one of your assignments.
R itself consists of an underlying engine that takes commands and provides feedback on these commands. Each command you give the R engine is either an:
Expression An expression is a statement that you give the R engine. R will evaluate the expression, give you the answer and not keep any reference to it for future use. Some examples include:
2 + 6
## [1] 8
sqrt(5)
## [1] 2.236068
3 * (pi/2) - 1
## [1] 3.712389
-Assignment
An assignment causes R to evaluate the expression and stores the result in a variable. This is important because you can use the variable in the future. An example of an assignment is:
x <- 2 + 6 myCoolVariable <- sqrt(5) another_one_number23 <- 3 * (pi/2) - 1 x
## [1] 8
myCoolVariable
## [1] 2.236068
another_one_number23
## [1] 3.712389
There are thousands of potential functions in R and its associated packages. To use these functions, you need to understand the basic taxonomy of a function. A function has two parts: - A unique name, and - The stuff (e.g., variables) passed to it within the parentheses.
Not all functions need any additional variables. For example, the function ls() shows which variables R currently has in memory and does not require any parameters. If you forget to put the parentheses on the function and only use its name, by default R will show you the code that is inside the function (unless it is a compiled function). This is because each function is also a variable. This is why you should not use function names for your variable names (see below for more on naming).
To find the definition of a function, the arguments passed to it, details of the implementation, and some examples, you can use the ? shortcut. To find the definition for the sqrt() function type ?sqrt and R will provide you the documentation for that function.
Functions may have more than one parameter passed to it. Often if there are a lot of parameters given then there will be some default values provided. For example, the log() function provides logarithms. The definition of the log function show log(x, base=exp(1)) (say from ?log). Playing around with the function shows:
log(2)
## [1] 0.6931472
log(2, base = 2)
## [1] 1
log(2, base = 10)
## [1] 0.30103
R recognizes over a dozen different types of data. All of the data types are characterized by what R calls classes. To determine the type of any variable you can use the built-in function class(x). This will tell you what kind of variable x is. What follows are some of the more common data types.
Numeric types represent the majority of numerical valued items you will deal with. When you assign a number to a variable in R it will most likely be a numeric type. Numeric data types can either be displayed with or without decimal places depending if the value(s) include a decimal portion. In fact, R will make any assignment of a numerical value a numeric by default. For example:
x <- 4 class(x)
## [1] "numeric"
x
## [1] 4
x <- numeric(4) x
## [1] 0 0 0 0
x[1] = 2.4 x
## [1] 2.4 0.0 0.0 0.0
Notice this is an all or nothing deal here, each element of a vector must be the same type and the de- fault value for a numeric data types is zero. Also notice (especially those who have some experience in programming other languages) that dimensions in vectors (and matrices) start at 1 rather than 0. Operations on numeric types proceed as you would expect but since the numeric type is the default type, you don’t really have to go around using the as.numeric(x) function. For example:
is.numeric(2.4)
## [1] TRUE
as.numeric(2) + 0.4
## [1] 2.4
2 + 0.4
## [1] 2.4
Word of Caution, It is important to point out here that you need to be rather careful when dealing with floating point numbers due in part to the way in which computers store these numbers and how they are presented to us in the R interface as well as when we need to perform logical operations on them. Consider the following case. The ancient Egyptians had an approach to calculating π as the ratio of 256/81.
e.pi <- 256/81 e.pi
## [1] 3.160494
Word of Caution cont’d: Very nice and apparently pretty close to 3.1416 so that they could get work done. Now, as we all know, the value of π is the ratio of a circle’s circumference to its diameter. We also know that it is a transcendeental number (e.g., on that cannot be produced using finite algebraic operations) and its decimal values never repeat.
print(e.pi, digits = 20)
## [1] 3.1604938271604936517
Word of Caution cont’d: There is another issue that you need to be careful with. You need to be considerate of how a computer stores numerical values. Consider the following:
x <- 0.3/3 x
## [1] 0.1
print(x, digits = 20)
## [1] 0.099999999999999991673
Why the difference? A computer deals in binary (0/1) representations and as such has a limited ability for precision, particularly for very large or very small numbers. Usually this does not cause much of a problem, but when you begin to work at crafting analyses, you should be aware of this drawback.
The character data type is the one that handles letters and letter-like representations of numbers. For example, observe the following:
x <- "If you can read this, you are beginning to take a step into a larger world." class(x)
## [1] "character"
length(x)
## [1] 1
Notice here how the variable x has a length of one, even though there are 37 characters within that string. If you want to know the number of characters, you need to use the nchar() function, otherwise it will tell you the ’vector length’ (see below) of the variable.
y <- 23 class(y)
## [1] "numeric"
z <- as.character(y) z
## [1] "23"
class(z)
## [1] "character"
Notice how the variable y was initially designated as a numeric type but if we use the as.character(y) function, we can coerce it into a non-numeric representation of the number. Combining character variables can be done using the paste() function to ’paste together’ a string of char- acters (n.b., notice the optional sep argument).
w = "cannot" x = "I" y = "can" z = "code in R" paste(x, w, z)
## [1] "I cannot code in R"
paste(x, y, z)
## [1] "I can code in R"